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Abstract. Auto- Associative models cover a large class of methods used 
in data analysis. In this paper, we describe the generals properties 
of these models when the projection component is linear and we pro- 
pose and test an easy to implement Probabilistic Semi-Linear Auto- 
Associative model in a Gaussian setting. We show it is a generalization 
of the PCA model to the semi-linear case. Numerical experiments on 
simulated datasets and a real astronomical application highlight the in- 
terest of this approach. 



1. Introduction 

Principal component analysis (PCA) |29l [20] [23] is a well established tool 
for dimension reduction in multivariate data analysis. It benefits from a 
simple geometrical interpretation. Given a set of n points Y = (yi, . . . , y n )' 
with yi £ MP and an integer < d < p, PCA builds the d-dimensional affine 
subspace minimizing the Euclidean distance to the scatter-plot [29) . The 
application of principal component analysis postulates implicitly some form 
of linearity. More precisely, one assumes that the data cloud is directed, and 
that the data points can be well approximated by there projections to the 
affine hyperplane corresponding to the first d principal components. 

Starting from this point of view, many authors have proposed nonlinear 
extensions of this technique. Principal curves or principal surfaces methods 
[16) \T7\ E] belong to this family of approaches, non-linear transformation of 
the original data set [3] too. The auto-associative neural networks can 
also be view as a non-linear PCA model [21 [271 HI EH]- In p2] we propose 
the auto-associative models (AAM) as candidates to the generalization of 
PCA using a projection pursuit regression algorithm [91 [25] adapted to the 
auto-associative case. A common point of these approaches is that they have 
the intent to estimate an auto-associative model whose definition is given 
hereafter. 

Definition 1.1. A function g is an auto- associative function of dimension 
d if it is a map from M. p to M. p that can be written g = R o P where P 
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(the "Projection") is a map from MP to M (generally d < p) and R (the 
"Restoration" or the "Regression") is a map from M d to MP. 

An auto- associative model (A AM) of dimension d is a manifold M. g of 
the form 

M g = {yeW,y-g(y) = 0} 
where g is an auto-associative function of dimension d. 

For example the PCA constructs an auto-associative model using as auto- 
associative function an orthogonal projector on an affine subspace of dimen- 
sion d. More precisely we have 

d 

< 7 (y) = m + ^<a i ,y-m>a i , y G R p 

i=l 

with y, m, aj G MP and the vectors Si chosen in order to maximize the 
projected variance, g can be written g = R o P with 

P(y)= (<a\y-m},...,(a d ,y-m)) 

and 

-R(x) = m + xia 1 + . . . + XdS d 

with x = (x\, . . . , Xd)' ■ The AAM is then the affine subspace given by the 
following equation 

M g = jy G R p ; y - m - ^ (a\y - m> S 1 = o| 

Interested reader can check that principal curves, principal surfaces, auto- 
associative neural networks, kernel PCA [31j . ISOMAP [36J and local linear 
embedding f3Qj have also the intent to estimate an AAM. 

In the PCA approach the projection and the restoration function are both 
linear. It is thus natural to say that the PCA is a Linear Auto- Associative 
Model. In the general case, the manifold A4 g set can be empty (i.e. the 
auto- associative function g have no fixed point) or very complicated to de- 
scribe. Our aim in this paper is to study from a theoretical and practical 
point of view the properties of some Auto- Associative models in an inter- 
mediary situation between the PCA model and the general case: we will 
assume that the projection function is linear and let the regression function 
be arbitrary. We call the resulting AAM the Semi-Linear Auto- Associative 
Models (SLAAM). 

Having restricted our study to the SLAAM, we have to give us some cri- 
teria to maximize. As we said previously, the PCA tries to maximize the 
projected variance or, equivalently, to minimize the residual variance. Com- 
mon AAM approaches used also the squared reconstruction error as criteria, 
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or more recently a penalized criteria [T7]. However as pointed out by M. E. 
Tipping and C. M. Bishop [37], one limiting disadvantage of this approach 
is the absence of a probability density model and associated likelihood mea- 
sure. The presence of a probabilistic model is desirable as 

• the definition of a likelihood measure permits comparison between 
concurrent models and facilitates statistical testing, 

• A single AAM may be extended to a mixture of such models, 

• if a probabilistic AAM is used to model the class conditional den- 
sities in a classification problem, the posterior probabilities of class 
membership may be computed. 

We propose thus a Gaussian generative model for the SLAAM and try to 
estimate it using a maximum likelihood approach. In the general case we are 
faced with a difficult optimization problem and we cannot go further without 
additional assumptions. It will appear clearly that if P is known then the 
estimation problem of a SLAAM is very close to an estimation problem in a 
regression context. There is however some differences we will enlighten. In 
particular it will appear that in order to get tractable maximum-likelihood 
estimates, we have to impose some restrictions to the noise. We call the 
resulting model of all these assumptions/simplifications a Semi-Linear Prin- 
cipal Component Analysis. It does not seem possible to add non-linearity 
to the PCA and get tractable likelihood estimate for P. But clearly, the 
assumption that P is known is too strong in practice. We propose thus to 
estimate it in a separate step using either the PCA or a contiguity analysis 
|26j by extending our previous work on the Auto- Associative models |13j . 
Finally, even if P is assumed known it remains to estimate the regression 
function R which is a non- linear function from M. d to MP. If d > 1 and p is 
moderately high the task become very complicated. Thus we simplify once 
more the model and assume that R is additive inspired by the Generalized 
Additive Model (GAM) approach [18J. 

In view of the experiments we have performed and we present there, 
it seems we obtain a practical and simple model which generalizes in an 
understandable way the PCA model to the non-linear case. 

The paper is organized as follows. Section [2] introduces the Probabilistic 
Semi-Linear Auto-Associative Models (PSLAAM) and relate them to the 
PCA and Probabilistic PCA models. In section [3] we present the Proba- 
bilistic Semi-Linear PCA models and the estimation of theirs parameters 
conditionally to the knowledge of the projection matrix P. Section [4] is 
devoted to the determination of the projection matrix P using contiguity 
analysis. Data sets and experiments are detailed in Section [5] with a real 
astronomical data set. Finally, some concluding remarks are proposed in 
Section [6l 
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2. Semi-Linear Auto-Associatif Models (SLAAM) 



2.1. Geometrical properties of the SLAAM. Let us first consider a 
general auto-associative model as given in the definition |1.1| We have the 
following evident property 

Proposition 2.1. Let H = {P(y); y G Ai g } C M. d . On H the projection 
function and the regression function verify 

(1) PoR = M d 
where Idd denote the identity function of~ML d . 

Proof. Let y G M. g and let x = P(y), then 

x = P(y) = P(g(y)) = P(P(P(y))) = P(P(x)). 

□ 

As a consequence, we have the following "orthogonality" property verified 
by an AAM when P is an additive function 

Proposition 2.2. Let V = {P(y); y G M p } and assume that the property 
extend on V , let y G M p , y = P(P(y)) and e = y — y • If P is additive, 
i.e. P(y + y') = P(y) + P(y'), then 

P(e) = 0. 

Proof. Using the property ([!]), we have on one hand P(y) = P(P(P(y))) = 
P(y). While on the other hand P(y) = P(y — e) = P(y) — P(e) giving the 
announced result. □ 

Clearly we have H C V and the assumption given in this proposition 
seems quite natural. We focus now on the semi-linear case and we assume 
that 

(2) P(y) = (<a\y),...,(a d ,y))=Py. 
with P = (a 1 , . . . , a rf )' a matrix of size (d,p). 

Proposition 2.3. Let g = Ro P be an auto-associative function, with P 
given in ^ and R verifying the property Let B = (a 1 , . . . , 3 d , & d+1 , . . . , j?) 
be an orthonormal basis of W with (a d , . . . , a^ 1 ) chosen arbitrarily. Let 
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y G M. g , and let y and f denote respectively the vector y and the auto- 
associative function r in the basis B, then 

( & \ ( Vl \ 



(3) 



yd 

Vd+i 



yd 

nm(yi, • ■ • ,m) 



V y P I \ r P (yi, ■■■,m) I 



Proof. It suffices to notice that the change of basis matrix Q is given by 

Q'= ( a 1 , a d , a( d+1 ), aP ) , 

thus the left multiplication of y and r by Q, using 0, will give (§. □ 



From this last proposition we can see that the Semi-Linear Auto- Associative 
models have a relatively simple geometrical structure and that we cannot 
expect to model highly non-linear models with them. 



2.2. Probabilistic Semi-Linear Auto- Associative Models (PSLAAM). 

In the sequel, we will denote by V the subspace spanned by the set of vec- 
tors (a 1 , . . . , a d ), and give us an arbitrary orthonormal basis of V 1 - denoted 



by (a rf+1 , . . . , a^). We will denote by P the matr ix (a 1 
the matrix (a rf+1 , . . . , aP) 1 . As in proposition 
matrix (P|P)' = (a 1 , . . . ,a p )'. 



2.3 



, a )' and by P 
Q represents the unitary 



2.2.1. General Gaussian Setting. 

Definition 2.1. Let x be a d- dimensional Gaussian random vector: 
(4) x-A/Xm*,^) 

and let e be a p-dimensional centered Gaussian random vector with a diag- 
onal covariance matrix £<? = Diag(<7i, . . . , a p ). 

The p-dimensional vector y is a Probabilistic Semi-Linear Auto-Associative 
Model (PSLAAM) if it can be written 



(( x i \ \ 



(5) 



Q' 



Xd 

^+i(x) 



+ £ 



R(x) +£, 



V V r p (x) J ) 
where the fj(x), d + 1 < j < p, are arbitrary real functions from ~R. d to 
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2.2.2. Link with the Principal Component Analysis. Assume that: 

(1) fj(x) = jlj for all j e {d + 1, . . .p}, 

(2) the covariance matrix of x, T, x = Diag(cr^, . . . , crj) is diagonal with 

01 > CT2 > ■ ■ ■ > CT d , 

(3) the Gaussian noise e have the following covariance matrix S e = 
Diag(0, . . . , 0, a 2 , . . . , a 2 ) with a < a d . 

then the vector y is a Gaussian random vector 

y~AA(/x,S) 



with 



and 



Q 



0~d 



a 



V 



cry 



and a 1 , . . . , 3 d are the d first eigenvectors given by the PCA. 



2.2.3. Link with the Probabilistic Principal Component Analysis. The prob- 
abilistic PCA |37| is a model of the form 

(6) y = (i + Wx + e, 

with W a (p, d) matrix, x a d-dimensional isotropic Gaussian vector, i.e. 
x ~ AA(0, Id), and e a p-dimensional centered Gaussian random vector with 
covariance matrix a 2 I p . The law of y is not modified if W is right multiplied 
by a (d, d) unitary matrix, it is thus possible to impose to the rows of W to 
be orthogonal (assuming that W is of full rank) . 

The following proposition is then straightforward 

Proposition 2.4. Assume that e (and thus e) is an isotropic Gaussian 
noise, i.e. Eg = o~ 2 I p , take fj = jlj for all d + 1 < j < p and set 

/<7i ... 0\ 

W = P' ° '' : 
: '•. '•• 
\0 ... o-J 

The resulting Probabilistic Semi-Linear Auto-Associative Model is a Proba- 
bilistic Principal Component Analysis. 



For this simple model there exists a close form of the posterior probability 
of y and for the maximum likelihood of the parameters of the model. In 
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particular, the matrix W can be estimated up to a rotation and spans the 
principal subset of the data. 



3. Semi-Linear PCA 



Our aim is now to generalize the PCA model we present in part (2.2.2) to 



the semi-linear case. We observe that in the PCA model, if the matrix P is 
known, then we are able to know the random variable x. This observation 
lead us to formulate the following hypothesis about the noise i: 

N: the Gaussian noise e have the following covariance matrix Eg = 
Diag(0,...,0,a 2 ,...,a 2 ). 



Expressing y in the basis B (definition 2.1 ) we get the following expression 
for y: 



/ VI \ ( xi \ / \ 



(7) 



Vd 
Vd+i 

V Vp I 



Xd 

V f p (x) J 



+ 





£d+l 



In other word, the coordinates of y can be split in two sets. The d first 
coordinates are the Gaussian random vector x, while the p — d remaining 
coordinates are a random vector z which is conditionally to x a Gaussian 
random vector J\f (r(x), a 2 I p -d}- Observe that the regression functions are 
dependents of the choice of the vectors a^+i, . 
lives in the orthogonal of V, we have x = Py. 



a p and that, as the noise e 



3.1. Maximum Likehood Estimates. The parameters we have to esti- 
mate are the position and correlation parameters fi x and T* x for the x part 
and (cr 2 , f) for the non- linear part. Given a set of n points Y = (yi, . . . , y n )' 

= YP' in 



m 



l p , we get by projection two sets of n points X 

{P -d 



(xi, 



l d , and Z = (zi, . . . , z n )' = YP' in 
Standard calculation give the maximum likehood for \x x and Yi x 



(8) 

and 

(9) 



Ma 



1 n 



1 n 

/ ,( x * — A^)( x « 

n ^-^ 



Hi 
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The maximum likehood of a 2 is given by 

It remains to estimate R. We consider two cases : R is a linear function 
and R is a linear combination of the elements of a B-Spline function basis. 
The linear case is just a toy example that we will use for comparison with the 
additive B-Spline case. In the non-linear case, we have to estimate a function 
from R d to W~ d . As we say in the introduction this is a difficult task and 
we restrict ourself to a generalized additive model (GAM) by assuming that 
the function R is additive, i.e. 

d 

(11) R( X ) = ^(x J ), 

where each r J is a map from M. into MP~ d . 



3.2. Linear Auto- Associative Models. In the linear case, we are looking 
for a vector [i and a (d,p — d) matrix R minimizing 

n 

\\zj - fi- R/xj|| . 

i=i 

It is easily verified that 

1 n 

fi = - y~) (zj - R'xj) = As - R'Az- 

i=i 

Setting X = X — lfi' x and Z = X — lp,' z , where 1 represent a vector of size 
n with 1 on every coordinates. Assuming that the matrix X'X is invertible, 
standard calculus show that 

R = (X'X^X'Z. 

Finally, using the decomposition in eigenvalues of the covariance matrix 
of Y, it is straightforward to verify the following theorem 

Theorem 3.1. // the d orthonormal vectors a 1 , . . . , a rf are the eigenvectors 
associated with the d first eigenvalues of the covariance matrix of Y then 
the estimated auto-associative model is the on obtain by the PC A. 

3.3. Additive Semi-Linear Auto- Associative Models. In order to es- 
timate the regression functions f ) 3 , j = 1 . . . , d, we express them as a linear 
combination of m B-Spline functions basis s- 7 ' where m is a number chosen by 



PROBABILISTIC AUTO-ASSOCIATIVE MODELS AND SEMI-LINEAR PCA 



9 



the user. We have thus to estimate the set of coefficients (aji), j = 1, . . . , d, 
I = 0, ... m by minimizing 

2 

j=l 1=1 



E 

Standard regression techniques give then the estimates 

d m 

R(x) = a + Yj ajis jl (xij), with a = ((S'S) -1 S'Z) 

3=1 1=1 

where S is the design matrix which depends of the knots position, degree of 



the B-Spline and the number of control points chosen by the user |14j . 

The estimated regression function r 7 , for j = 1 , . . . , d are then given by 
the formula 

m 



E 



3.4. Estimation in practice. The drawback of the previous maximum 
likehood equations is that, given the projection matrix P, we have to perform 
a rotation of the original data set and next to perform an inverse rotation of 
the estimated model. In practice, we avoid such computations by estimating 
the model using the following steps: 

• (C) Center and (optionally) standardize the data set Y: obtain Y, 

• (P) Compute the projected data set X = YP' (X is centered), 

• (R) Compute the regression Y ~ X (without intercept), 

• (S) Compute the log-likelihood and the BIC criteria. 

As we can see the main difference is in the regression step: we estimate 
directly a function from M. d to MP. In practice, as the non- linear part of 
the model is in V ± , the regression function we obtain numerically give the 
identity function in the V space. 



3.5. Model Selection. Since a Semi-linear PCA model depends highly of 
the projection matrix P, model selection allows to select among various 
candidate the best projection. Several criteria for model selection have been 
proposed in the literature and the widely used are penalized likelihood cri- 
teria. Classical tools for model selection include the AIC [1] and BIC [33] 
criteria. The Bayesian Information Criterion (BIC) is certainly the most 
popular and consists in selecting the model which penalizes the likelihood 
by log(n) where ~f{M) is the number of parameters of the model M 
and n is the number of observations. 
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In practice we will fix a set of vectors a 1 , . . . , a dmax given either by the 
contiguity analysis (section |4| or the PCA and select the dimension of the 
model using the BIC criteria because of its popularity. The projection ma- 
trices we compare are thus P = (a 1 )', P = (a 1 , a 2 )',.-- and so on. 



4. Contiguity Analysis 

Given (a 1 , . . . , 3 d ) an orthonormal set of vector in MP, an index /: MP xd — > 
M+ is a functional measuring the interest of the projection of the vector y 
on Vec(a 1 , . . . ,a d ) with a non negative real number. A widely used choice 
of I is /((a^y) , . . . , (a d ,y)) = £r(Var [Py]), the projected variance. This 
is the criteria maximized in the usual PCA method |23j. 

The choice of the index / is crucial in order to find " good" parametrization 
directions for the manifold to be estimated. We refer to [21] and |24j for 
a review on this topic in a projection pursuit setting. The meaning of 
the word "good" depends on the considered data analysis problem. For 
instance, Friedman et al [IQllE]) and more recently Hall [15], have proposed 
an index which measure the deviation from the normality in order to reveal 
more complex structures of the scatter plot. An alternative approach can be 
found in [5] where a particular metric is introduced in PCA in order to detect 
clusters. We can also mention the index dedicated to outliers detection [28] . 

Our approach generalizes the one we present in |13] and consists in defin- 
ing a contiguity coefficient similar to Labart one's [26] whose maximization 
allows to unfold nonlinear structures. 

A contiguity matrix isanxn boolean matrix M whose entry is = 1 if 
data points i and j are "neighbors" and rriij = otherwise. Lebart proposes 
to use a threshold ro to the set of n(n — 1) distances in order to construct 
this matrix but the choice of ro could be delicate. In [13] we propose to use 
a first order contiguity matrix, i.e. mij = 1 iff i is the nearest neighbor of 
j in order to construct the proximity graph. In order to get a more robust 
estimate of the neighbor structure, it is possible to generalize this approach 
and to use a A;-contiguity matrix, i.e. = 1 iff i is one of the k-nearest 
neighbor of j. 

The contiguity matrix being chosen, we compute the local covariance ma- 
trix 

1 n n 

(12) V* = -^ m^y, - y,)(y, - y,)' 
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and the total variance matrix 

1 n 

(13) V = 



n . 

i=i 



The axis of projection are then estimated by maximizing the contiguity index 



(14) /(a\...,a d ) = ^ 



a"Va ! 

i=i 

Using standard optimization techniques, In can be shown that the resulting 
axis are the d eigenvectors associated with the largest eigenvalues of the 
matrix V*~ 1 V. 



5. Examples 



We first present two illustrations of the estimation principle of PSLAAM 
on low dimensional data (Section 5.1 and 5.2). These two simulated ex- 
amples are very similar from the one we use in our previous article with 
S. Girard [13J. Second, PSLAAM is applied to an astronomical analysis 



problem in Section 5.3 



Similarly, we always use an additive B-Splin e reg ression model for the 
estimation of the regression function R (section 3.3). The B-Spline are of 



degree 3 and we select the number of control points using the BIC 



5.1. First example on simulated data. The data are simulated using a 
one-dimensional regression function in IR 3 . The equation of the AA model 
is given by 

(15) x — > (x, s'mx, cosx), 

and thus P(x,y,z) = x . The first coordinate of the random vector is 
sampled from a centered Gaussian distribution with standard deviation a x = 
3 a thousand times. An independent noise with standard deviation a = 1 
has then been added to the y and z coordinates. 

The axis of projection have been computed thanks to the contiguity anal- 
ysis (section [4]) using the 3 nearest neighbors for the proximity graph. The 
correlations between the projected data set and the original data set are 

X Y Z 

Proji 0.9999680850 -0.0005794581 0.0089238830 

which show that the first axis given by the contiguity analysis is very close 
from the it was expected. The result of the contiguity analysis can 

be visualized in the figure [TJ 
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Correlation of the Variables [1 ,2] Plane [1,2] of the PCA 

with the projected values and regression functions 




-1,0 -0.5 0.0 0.5 1.0 -3-2-10123 

VaM PCA 1 



Figure 1. Correlation of the first axis obtain using a conti- 
guity analysis with the X and Y variables and representation 
of the scatter-plot and regression function in the main PCA 
plan. This graphic has been obtained with R using the plot 
command of the aam library. 

The projected variance on the first axis is 9.20496 which is also very close 
from 9. We use the BIC criteria in order to select the dimension of the 
model and the number of control points. A summary of the tested model is 
given in the table [T] 



Finally the result of the regression is drawn in the figure [2] 




Figure 2. The simulated scatter-plot (blue), the estimated 
AAM (orange) and the true AAM (grey). This graphic is 
obtained with R using the drawSd command of the aam li- 
brary. 
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108887.5(44) 


0.927834 


10888.3 


0.927976 




2 


11604(88) 


0.885939 







Table 1. Values of the BIC criteria for d = 1 and d = 2 
and for various number of control points (given in the first 
column). The number of free parameters of each model is 
given in parenthesis. The BIC criteria selects the model of 
dimension 1 with 11 control points. The axis of projection 
can be either the one obtained by contiguity analysis or the 
one obtained using the PCA. 



5.2. Second example on simulated data. In our second example the 
AAM is given by 

(16) (x, y) — > (x, y, cos(-7rr/3)(l — exp(— 64r 2 )) exp(0.2r)) 

with r = \/x 2 + y 2 and thus P(x, y, z) = (x, y). The first two coordinates of 
the random vector are sampled from a centered Gaussian distribution with 
covariance matrix 




and n = 1000 points are simulated. An independent noise with standard 
deviation a = 0.5 has then been added to the z coordinate. 

The correlations between the projected data set and the original data are 



X Y Z 

Proji 0.99924737 -0.1488330 0.0179811 
Proj 2 -0.14239437 0.98532966 0.01396622 
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which show that the (x, y)-plan is the plan essentially chosen by the conti- 
guity analysis. The result of the contiguity analysis is displayed in the figure 

El 




Figure 3. The AAM components and the 2-neighbors 
graph, the correlation circle of the AAM components with 
the variables and representation of the scatter-plot in the 
PCA plan. These pictures have been obtained using the plot 
command of the aam library. 

The BIC criteria select 7 control points. A summary of the tested model is 
given in the table [2} The selected model over-estimate the residual variance 
by a factor 2. It is not a surprising result as the original model is not additive 
and we cannot expect to reconstruct it exactly. We don't show the results 
with the PCA as the first axis this method select is the Z-axis which is 
clearly the wrong parametrization. 



The true model and the estimated model obtained with an additive B- 



Spline regression are given in the figure 5.2 



5.3. Example in spectrometry analysis. Finally we illustrate the per- 
formance of the semi-linear PCA on a real data set. The data consists of 
19-dimensional spectral information of 487 stars |34} [T2| I3"5j ITT] and they 
have been classified in 6 groups. They have been modeled by [32J using an 
auto-associative neural networks based on a 19-30-10-30-19 network. Using 
the terminology of this article the model proposed by M. Scholz and its co- 
authors is an auto-associative model of dimension 10. We select the model 
using the BIC criteria. The main results are the following: 



(1) The axis of projection given by the PCA outperform largely the re- 
sults we obtain with the contiguity analysis, for any choice of control 
point. 
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* the BIC criteria for d = 1 and 



and for various number of control points (given in the first 
column). The number of free parameters of each model is 
given in parenthesis. The BIC criteria selects the model of 
dimension 2 with 7 control points. 




(a) (b) (c) 

Figure 4. (a) the original data set (blue) and the individ- 
ual regression functions, (b)-(c) two views of the original 
manifold (grey) and of the extrapolated manifold using an 
additive B-Spline regression (yellow-red). These images have 
been obtained using the draw3d function of the aam library. 



(2) The BIC criteria retains a model of dimension 5 with 9 Control 
Points (871 parameters) when we use a non-linear regression step. 
The residual variance is a 2 = 0.0080763 while the total variance 
(inertia) of the data was 26.59832. 

(3) The BIC criteria retains a model of dimension 12 (307 parameters) 
when we use a linear regression step. Observe that in this case, we 
are performing an usual PCA (theorem 3.1). 



The data cloud in the main PCA space with the values predicted by the 
model is displayed in the figure [5] 
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Figure 5. Data cloud in the main PC A space with the pre- 
dicted values (green). 

A summary of some tested model are given in the tableland some planes 
of projection are displayed in the picture [6j We visualize each components 
of the regression functions by setting all, except one, predictors to zero and 
we represent the evolution of the regression function in the M 14 space in the 
graphic [7j 



PCA 


Control Points 


BIC value 


dim 


Residual variance 


Linear 


1829,95(307) 


12 


0,0049702 


7 


1187,26(820) 


6 


0,00727521 


8 


1147,62(776) 


5 


0,00975073 


9 


453,387(871) 


5 


0,0080763 


10 


701,769(966) 


5 


0,00768342 


11 


1333,45(1061) 


5 


0,00773327 



Table 3. Values of the BIC criteria for various number of 
control points (given in the first column). The BIC criteria 
select the model of dimension 5 with 9 control points using 
as projection matrix the 5 axis given by the PCA. The total 
variance (inertia) of the data set was 26.59832. 
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Figure 6. Some plans of projection of the PCA with the 
2-neighbors graph (left) and with the single regression func- 
tions (right). As we use the PCA for the projection matrix, 
the AAM components are the PCA components. The colors 
of the points represent the classification of the stars. 




Figure 7. The individual regression functions from R 5 to 
M 14 . In each row we have the non-zero predictor sampled in 
the range [min, max], and in each column the evolution of the 
functions in the dimension 6,. ..,19. The system of coordinates 
is the one given by the PCA. 
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6. Conclusion 



We have presented a class of auto-associative model for data modeling and 
visualization called semi-linear auto-associative models. We provided theo- 
retical groundings for these models by proving that the principal component 
analysis and the probabilistic principal component analysis are special cases. 
Our model allows to models data set with a simple non-linear component 
and is truly generative with an underlying probabilistic interpretation. How- 
ever it does not allow to models data with a strong non-linear component 
and it depends highly on the choice of the projection matrix. 

The Semi-Linear PC A have been implemented in C+- 1- using the stk++ li- 



brary [22] and is available at: https://sourcesup.renater.fr/projects/ 
aam/. 

The program is accompanied with a set of R scripts which allows to 
simulate and display the results of the aam program. 
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